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While current studies on complex networks focus on systems that change relatively slowly in time, 
the structure of the most visited regions of the Web is altered at the timescale from hours to days. 
Here we investigate the dynamics of visitation of a major news portal, representing the prototype 
for such a rapidly evolving network. The nodes of the network can be classified into stable nodes, 
that form the time independent skeleton of the portal, and news documents. The visitation of the 
two node classes are markedly different, the skeleton acquiring visits at a constant rate, while a 
news document's visitation peaking after a few hours. We find that the visitation pattern of a 
news document decays as a power law, in contrast with the exponential prediction provided by 
simple models of site visitation. This is rooted in the inhomogeneous nature of the browsing pattern 
characterizing individual users: the time interval between consecutive visits by the same user to the 
site follows a power law distribution, in contrast with the exponential expected for Poisson processes. 
We show that the exponent characterizing the individual user's browsing patterns determines the 
power- law decay in a document's visitation. Finally, our results document the fleeting quality of 
news and events: while fifteen minutes of fame is still an exaggeration in the online media, we find 
that access to most news items significantly decays after 36 hours of posting. 



The recent interest in the topological properties of 
complex networks is driven by the realization that under- 
standing the evolutionary processes responsible for net- 
work formation is crucial for comprehending the topolog- 
ical maps describing many real systems [1-9]. A much 
studied example is the WWW, whose topology is driven 
by its continued expansion through the addition of new 
documents and links. This growth process has inspired 
a series of network models that reproduce some of the 
most studied topological features of the Web [10-17]. 
The bulk of the current topology driven research focuses 
on the so called publicly indexable web, which changes 
only slowly, and therefore can be reproduced with rea- 
sonable accuracy. In contrast, the most visited portion 
of the WWW, ranging from news portals to commercial 
sites, change within hours through the rapid addition and 
removal of documents and links. This is driven by the 
fleeting quality of news: in contrast with the 24-hour 
news cycle of the printed press, in the online media the 
non-stop stream of new developments often obliterates 
an event within hours. But the WWW is not the only 
rapidly evolving network: the wiring of a cell's regulatory 
network can also change very rapidly during cell cycle or 
when there are rapid changes in environmental and stress 
factors [7] . Similarly, while in social networks the cumu- 
lative number of friends and acquaintances an individual 
has is relatively stable, an individual's contact network, 
representing those that it interacts with during a given 
time interval, is often significantly altered from one day 
to the other. Given the widespread occurrence of these 
rapidly changing networks, it is important to understand 
their topology and dynamical features. 

Here we take a first step in this direction by studying 
as a model system a news portal, consisting of news items 



that are added and removed at a rapid rate. In particu- 
lar, we focus on the interplay between the network and 
the visitation history of the individual documents. In 
this context, users are often modeled as random walk- 
ers, that move along the links of the WWW. Most re- 
search on diffusion on complex networks [18-26] ignores 
the precise timing of the visit to a particular web docu- 
ment. There are good reasons for this: such topological 
quantities as mean free path or probability of return to 
the starting point can be expressed using the diffusion 
time, where each time step corresponds to a single dif- 
fusion step. Other approaches assume that the diffusion 
pattern is a Poisson process [27], so that the probabil- 
ity of an HTML request in a dt time interval is pdt. In 
contrast, here we show that the timing of the browsing 
process is non-Poisson, which has a significant impact on 
the visitation history of web documents as well. 



I. DATASET AND NETWORK STRUCTURE 

Automatically assigned cookies allow us to reconstruct 
the browsing history of approximately 250,000 unique 
visitors of the largest Hungarian news and entertainment 
portal (origo.hu), which provides online news and maga- 
zines, community pages, software downloads, free email 
and search engine, capturing 40% of all internal Web traf- 
fic in Hungary. The portal receives 6,500,000 HTML hits 
on a typical workday. We used the log files of the portal 
to collect the visitation pattern of each visitor between 
11/08/02 and 12/08/02, the number of new news docu- 
ments released in this time period being 3,908. 

From a network perspective most web portals consist 
of a stable skeleton, representing the overall organization 
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of the web portal, and a large number of news items that 
are documents only temporally linked to the skeleton. 
Each news item represents a particular web document 
with a unique URL. A typical news item is added to the 
main page, as well as to the specific news subcategories 
to which it belongs. For example, the report about an 
important soccer match could start out simultaneously 
on the front page, the sports page and the soccer subdi- 
rectory of the sports page. As a news document "ages" , 
new developments compete for space, thus the document 
is gradually removed from the main page, then from the 
sports page and eventually from the soccer page as well. 
After some time (which varies from document to docu- 
ment) an older news document, while still available on 
the portal, will be disconnected from the skeleton, and 
can be accessed only through a search engine. To fully 
understand the dynamics of this network, we need to dis- 
tinguish between the stable skeleton and the news docu- 
ments with heavily time dependent visitation. 

The documents belonging to the skeleton are charac- 
terized by an approximately constant daily visitation pat- 
tern, thus the cumulative number of visitors accessing 
them increases linearly in time. In contrast, the visita- 
tion of news documents is the highest right after their 
release and decreases in time, thus their cumulative vis- 
itation reaches a saturation after several days. This is 
illustrated in Fig. 1, where we show the cumulative visi- 
tation for the main page (www.origo.hu/index.html) and 
a typical news item. 
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FIG. 1. The cumulative number of visits to a typical skele- 
ton document (a) and a news document (b). The difference 
between the two visitation patterns allows us to distinguish 
between news documents and the stable documents belonging 
to the skeleton. 



ments in most cases have a different format. But given 
some ambiguities in the naming system, we used the vis- 
itation based distinction to finalize the classification of 
the documents into skeleton and news. 

When visiting a news portal, we often get the impres- 
sion that it has a hierarchical structure. As shown in 
Fig. 2 the skeleton forms a complex network, driving the 
visitation patterns of the users. Indeed, the main site, 
shown in the center, is the most visited, and the docu- 
ments to which it directly links to also represent highly 
visited sites. In general (with a few notable exceptions, 
however), the further we go from the main site on the 
network, the smaller is the visitation. The skeleton of 
the studied portal has 933 documents with an average 
degree close to 2 (i.e. it is largely a tree, with only a few 
loops, confirming our impression of a hierarchical topol- 
ogy), the network having a few well connected nodes (or 
hubs), while many are linked to the skeleton by a single 
link [16,17]. 




FIG. 2. The skeleton of the studied web portal has 933 
nodes. The area of the circles assigned to each node in the 
figure is proportional with the logarithm of the total num- 
ber of visits to the corresponding web document. The width 
of the links are proportional with the logarithm of the to- 
tal number of times the hyperlink was used by the surfers 
on the portal. The central largest node corresponds to the 
main page (www.origo.hu/index.html) directly connected to 
several other highly visited sites. 



The difference between the two visitation patterns al- 
lows us to distinguish in an automated fashion the web- 
sites belonging to the skeleton from the news documents. 
For this we make a linear regression to each site's cumu- 
lative visitation pattern and calculate the deviation from 
the fitted lines, documents with very small deviations 
being assigned to the skeleton. The validity of the algo- 
rithm was checked by inspecting the URL of randomly 
selected documents, as the skeleton and the news docu- 



II. THE DYNAMICS OF NETWORK VISITATION 

Given that the difference between the skeleton and the 
news documents is driven by the visitation patterns, next 
we focus on the interplay between the visitation pattern 
of individual users and the overall visitation of a doc- 
ument. The overall visitation of a specific document is 
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expected to be determined both by the document's posi- 
tion on the web page, as well as the content's potential 
importance for various user groups. In general the num- 
ber of visits n(t) to a news document follows a dampened 
periodic pattern: the majority of visits (28%) take place 
within the first day, decaying to only 7% on the second 
day, and reaching a small but apparently constant visi- 
tation beyond four days (Fig 3a). Given that after a day 
or two most news are archived, the long-term saturation 
of visitation corresponds to direct search or traffic from 
outside links. 
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FIG. 3. (a) The visitation pattern of news documents on a 
web portal. The data represents an average over 3,908 news 
documents, the release time of each being shifted to day one, 
keeping the release hour unchanged. The first peak indicates 
that most visits take place on the release day, rapidly decaying 
afterward, (b) The same as plot (a), but to reduce the daily 
fluctuations we define the time unit as one web page request 
on the portal, (c) Logarithmic binned decay of visitation 
of (b) shown in a log- log plot, indicating that the visitation 
follows n(t) ~ (t + to) _/3 , with t = 12 and = 0.3 ± 0.1 
shown as a continuous line on both (b) and (c). 



To understand the origin of the observed decay in visi- 
tation, we assume that the portal has N users, each read- 
ing the news document of direct interest for him/her. 
Therefore, at every time step each user reads a given 
document with probability p. Users will not read the 
same news more than once, therefore the number of users 
which have not read a given document decreases with 
time. We can calculate the time dependence of the num- 
ber of potential readers to a news document using 



dJ\T(t) 
dt 



(1) 



where M(t) is the number of visitors which have not read 
the selected news document by time t. The probability 



that a new user reads the news document is given by 
N(t)p. Equation (1) predicts that 



J\f(t) =Nexp(-t/t 1/2 ) 



(2) 



where ti/% = 1/p, characterizing the halftime of the news 
item. The number of visits (n) in unit time is given by 
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exp(-t/t 1/2 ). 



(3) 



Our measurements indicate, however, that in contrast 
with this exponential prediction the visitation does not 
decay exponentially, but its asymptotic behavior is best 
approximated by a power law (Fig 3c) 



n(t) ~ r 



(4) 



with (3 = 0.3 ±0.1, so that while the bulk of the visits 
takes place at small t, a considerable number of visits are 
recorded well beyond the document's release time. 

Next we show that the failure of the exponential model 
is rooted in the uneven browsing patterns of the individ- 
ual users. Indeed, Eqs. (1) and (2) are valid only if 
the users visit the site in regular fashion such that they 
all notice almost instantaneously a newly added news 
document. In contrast, we find that the time interval 
between consecutive HTML requests by the same visi- 
tor is not uniform, but follows a power law distribution, 
P(r) - r~ a , with a = 1.2 ± 0.1 (Fig 4a). 
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FIG. 4. (a) The distribution of time intervals between two 
consecutive visits of a given user. The cutoff for high r 
(t ~ 10 6 ) captures finite size effects, as time delays over a 
week are undercounted in the month long dataset. The con- 
tinuous line has slope a — 1.2 (b) The halftime distribution 
for individual news items, following a power-law with expo- 
nent -1.5 ±0.1. 



This means that for each user numerous frequent 
downloads are followed by long periods of inactivity, a 
bursting, non-Poisson activity pattern that is a generic 
feature of human behavior [28] and it is observed in many 
natural and human driven dynamical processes [29-40]. 
In the following we show that this uneven user visitation 
pattern is responsible for the slow decay in the visitation 



3 



of a news document and that n(t) can be derived from 
the browsing pattern of the individual users. 

Let us assume that a given news document was released 
at time to and that all users visiting the main page after 
the release read that news. Because each user reads each 
document only once, the visitation of a given document 
is determined by the number of new users visiting the 
page where the document is featured. 
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FIG. 5. The browsing pattern of four users, every vertical 
line representing the time of a visit to the main page. The 
time a news document was released on the main page is shown 
at to • The thick vertical bars represent the first time the users 
visit the main page after the news document was released, i.e. 
the time they could first visit and read the article. 



In Fig. 5 we show the browsing pattern for four differ- 
ent users, each vertical line representing a separate visit 
to the main page. The thick lines show for each user 
the first time they visit the main page after the studied 
news document was released at to- The release time of 
the news (to) divides the time interval r into two con- 
secutive visits of length t' and £, where t + t' = r. The 
probability that a user visits at time t after the news was 
released is proportional to the number of possible r in- 
tervals, which for a given t is proportional to the possible 
values of t' given by the number of intervals having a 
length larger than £, 



T- a dT~t~ a+1 . 



(5) 



If we have TV users, each following a similar browsing 
statistics, the number of new users visiting the main page 
and reading the news item in a unit time (n(t)) follows 



n(t) ~ NP(t > t) ~ Nf 



-a+l 



(6) 



Equation (6) connects the exponent a characterizing 
the decay in the news visitation to (3 in Eq. (4), char- 
acterizing the visitation pattern of individual users, pro- 
viding the relation 



(3 = a-l. 



(7) 



This is in agreement with our measurements within the 
error bars, as we find that a = 1.2 ±0.1 and (3 = 0.3 ±0.1. 



To further test the validity of our predictions we stud- 
ied the relationship between a and (3 for the more gen- 
eral case, when a user that visits the main page reads a 
news item with probability p. We numerically generated 
browsing patterns for 10,000 users, the distribution for 
the time intervals between two consecutive visits, P(r), 
following a power-law with exponent a — 1.5 (Fig. 6 
inset). 
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FIG. 6. We numerically generated browsing patterns for 
10,000 users, the distribution of the time intervals between 
two consecutive visits by the same user following a power-law 
with exponent a = 1.5. We assume that users visiting the 
main page will read a given news item with probability p. 
The number of visits per unit time decays as a power-law 
with exponent (3 = 0.5 for four different values of p (circles for 
p = 1, squares for p — 0.7, diamonds for p = 0.5 and triangle 
for p = 0.3). The empty circles represent the visitation of 
a news item if the users follow a Poisson browsing pattern. 
We keep the average time between two consecutive visit of 
each user the same as the one observed in the real data. As 
the figures indicates, the Poisson browsing pattern cannot 
reproduce the real visitation decay of a document, predicting 
a much faster (exponential) decay. 



In Fig. 6 we calculate the visits for a given news item, 
assuming that the users visiting the main page read the 
news with probability p, characterizing the "stickiness" 
or the potential interest in a news item. As we see in the 
figure the value of (3 is close to 0.5 as predicted by (7). 
Furthermore, we find that (3 is independent of p, indicat- 
ing that the inter-event time distribution P(r) character- 
izing the individual browsing patterns is the main factor 
that determines the visitation decay of a news document, 
the difference in the content (stickiness) of the news play- 
ing no significant role. As a reference, we also determined 
the decay in the visitation assuming that the users fol- 
low a Poisson visitation pattern [27] with the same inter- 
event time as observed in the real data. As Fig. 6 shows, 
a Poisson visitation pattern leads to a much faster decay 
in document visitation then the power-law seen in Fig. 
3c. Indeed, using Poisson inter-event time distribution in 
(5) would predict an exponentially decaying tail for n(t). 
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It is useful to characterize the interest in a news doc- 
ument by its half time (T^), corresponding to the time 
frame during which half of all visitors that eventually ac- 
cess it have visited. We find that the overall half-time 
distribution follows a power law (Fig. 4b), indicating 
that while most news have a very short lifetime, a few 
continue to be accessed well beyond their initial release. 
The average halftime of a news document is 36 hours, i.e. 
after a day and a half the interest in most news fades. 
A similar broad distribution is observed when we inspect 
the total number of visits a news document receives (Fig. 
7), indicating that the vast majority of news generate lit- 
tle interest, while a few are highly popular [41]. Similar 
weight distributions are observed in a wide range of com- 
plex networks [42-46]. 




FIG. 7. The distribution of the total number of visits dif- 
ferent news documents receive during a month. The tail of 
the distribution follows a power law with exponent 1.5. 

The short display time of a given news document, com- 
bined with the uneven visitation pattern indicates that 
users could miss a significant fraction of the news by not 
visiting the portal when a document is displayed. We 
find that a typical user sees only 53% of all news items 
appearing on the main page of the portal, and downloads 
(reads) only 7% of them. Such shallow news penetration 
is likely common in all media, but hard to quantify in 
the absence of tools to track the reading patterns of in- 
dividuals. 



III. DISCUSSION 

Our main goal in this paper was to explore the inter- 
play between individual human visitation patterns and 
the visitation of specific websites on a web portal. While 
we often tend to think that the visitation of a given doc- 
ument is driven only by its popularity, our results offer 
a more complex picture: the dynamics of its accessibility 
is equally important. Indeed, while "fifteen minutes of 



fame" does not yet apply to the online world, our mea- 
surements indicate that the visitation of most news items 
decays significantly after 36 hours of posting. The aver- 
age lifetime must vary for different media, but the decay 
laws we identified are likely generic, as they do not de- 
pend on content, but are determined mainly by the users' 
visitation and browsing patterns [28]. These findings also 
offer a potential explanation of the observation that the 
visitation of a website decreases as a power law follow- 
ing a peak of visitation after the site was featured in the 
media [47]. Indeed, the observed power law decay most 
likely characterizes the dynamics of the original news ar- 
ticle, which, due to the uneven visitation patterns of the 
users, displays a power law visitation decay (see eq. (4)). 

These results are likely not limited to news portals. In- 
deed, we are faced with equally dynamic network when 
we look at commercial sites, where items are being taken 
off the website as they are either sold or not carried any 
longer. It is very likely that the visitation of the individ- 
ual users to such commercial sites also follows a power 
law interevent time, potentially leading to a power law 
decay in an item's visitation. The results might be ap- 
plicable to biological systems as well, where the stable 
network represents the skeleton of the regulatory or the 
metabolic network, indicating which nodes could interact 
[45,7], while the rapidly changing nodes correspond to the 
actual molecules that are present in a given moment in 
the cell. As soon as a molecule is consumed by a reaction 
or transported out of the cell, it disappears from the sys- 
tem. Before that happens, however, it can take place in 
multiple interactions. Indeed, there is increasing experi- 
mental evidence that network usage in biological systems 
is highly time dependent [48,49]. 

While most research on information access focuses on 
search engines [50] , a significant fraction of new informa- 
tion we are exposed to comes from news, whose source is 
increasingly shifting online from the traditional printed 
and audiovisual media. News, however, have a fleeting 
quality: in contrast with the 24-hour news cycle of the 
printed press, in the online and audiovisual media the 
non-stop stream of new developments often obliterates 
a news event within hours. Through archives the In- 
ternet offers better long-term search-based access to old 
events then any other media before. Yet, if we are not 
exposed to a news item while prominently featured, it is 
unlikely that we will know what to search for. The ac- 
celerating news cycle raises several important questions: 
How long is a piece of news accessible without targeted 
search? What is the dynamics of news accessibility? The 
results presented above show that the online media allows 
us to address these questions in a quantitative manner, 
offering surprising insights into the universal aspects of 
information dynamics. Such quantitative approaches to 
online media not only offer a better understanding of in- 
formation access, but could have important commercial 
applications as well, from better portal design to under- 
standing information diffusion [51-53], flow [54] and mar- 
keting in the online world. 
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